This small dataset compares spectral measures generated by both PraatSauce v0.2.2 and VoiceSauce v1.31 at 1 msec intervals for 9 White Hmong lexical items spoken by a single female speaker. The original audio files can be found here. For both scripts, 5 formants were estimated with a maximum formant frequency of 5500 Hz; minimum and maximum F0 values were set to 50 Hz and 600 Hz for all F0 estimators. For VoiceSauce, the STRAIGHT F0 estimate and Snack formant/bandwidth estimates were used for harmonic amplitude corrections.
The method column indicates whether the formant bandwidths were estimated using Praat (PraatSauce) or Snack (VoiceSauce), or whether the Hawks and Miller formula was used.
In Hmong orthography, final -g indicates a low-falling breathy tone, while -m indicates creaky tone.
head(df)
## Filename Item Label seg_Start seg_End t_ms t method
## 1 25e-cab-w_Audio cab a 585.504 868.719 585.504 0.000000000 formula
## 2 25e-cab-w_Audio cab a 585.504 868.719 586.504 0.003546099 formula
## 3 25e-cab-w_Audio cab a 585.504 868.719 587.504 0.007092199 formula
## 4 25e-cab-w_Audio cab a 585.504 868.719 588.504 0.010638298 formula
## 5 25e-cab-w_Audio cab a 585.504 868.719 589.504 0.014184397 formula
## 6 25e-cab-w_Audio cab a 585.504 868.719 590.504 0.017730496 formula
## script measure value corrected
## 1 PraatSauce pF0 247.514 uncorrected
## 2 PraatSauce pF0 247.966 uncorrected
## 3 PraatSauce pF0 248.418 uncorrected
## 4 PraatSauce pF0 248.870 uncorrected
## 5 PraatSauce pF0 249.322 uncorrected
## 6 PraatSauce pF0 249.774 uncorrected
In the plots which follow, the PraatSauce measures are unsmoothed. If you want to compare to smoothed estimates, uncomment the two lines:
ps.fbw <- cbind(ps.fbw[1:6], apply(ps.fbw[7:43], 2, filter, filter=f21, sides=2))
ps.ebw <- cbind(ps.ebw[1:6], apply(ps.ebw[7:43], 2, filter, filter=f21, sides=2))
This implements a symmetric kernel filter. This is different from what VoiceSauce does. VoiceSauce uses the Matlab filter() function, by default a lag filter which pads with zeros. So while the smoothed value of sample 20 is equal to \(\sum_{i=1}^{20}/20\), the smoothed value of sample 19 is not undefined, but is calculated as \(\sum_{i=1}^{19}/20\), the smoothed value of sample 18 is \(\sum_{i=1}^{18}/20\), etc.
If you want to smooth the Matlab way, use the lag kernel by selecting filter=f20 and set sides=1.
All F0 estimators except for STRAIGHT have difficulty with the somewhat constricted vowel quality of cav ‘to argue’.
Compared to the formula estimates, PraatSauce estimated bandwidths are huge…
… but VoiceSauce Praat-estimated bandwidths are an order of magnitude huger.
VoiceSauce’s Snack estimates (if that’s really what they are) look less erratic.
PraatSauce estimates not completely off from Snack’s.
Note that the choice of bandwidth estimator is irrelevant here.
The middle third of cav is a real problem for PraatSauce (at least with the chosen settings).
The higher-order harmonics are not as much of a problem.
VoiceSauce estimates are consistently 20-25 dB lower than the PraatSauce estimates, and are sometimes negative, which seems…strange. This suggests to me they are being attenuated somewhere, though I have not been able to find the piece of code where this happens.
Here, choice of formant bandwidth estimator potentially matters.
In these plots, PraatSauce is using Praat and VoiceSauce is using Snack estimates.
For VoiceSauce, using estimated bandwidths is virtually unnoticeable:
For PraatSauce, using the formula bandwidths makes only very minor differences:
More interesting is probably a comparison of the corrected differences.
PraatSauce seems to have higher difference estimates for (some of) the -g items.
The issue with the middle third of cav might be regarded as positive if this token is really being produced with nonmodal voice.
Praat(Sauce) estimates are comparable if smoothed.
Here just showing HNR05 and HNR15 for clarity.
Again, the Praat estimates differ in amplitude, but maintain roughly the same trajectories. However, the PraatSauce implementation is much less sophisticated than that of VoiceSauce, and relies entirely on Praat’s To Harmonicity... function.
This is obviously a tiny sample and so firm conclusions cannot be drawn. However, some observations:
Praat F0 estimation is generally pretty OK
PraatSauce’s harmonic amplitude detection is not as robust/smooth - should be investigated further
VoiceSauce’s smoothing (by dint of Matlab’s filter() behaviour) can do strange things to the left edges. But estimates of anything at the edges are probably unreliable anyways
VoiceSauce may correct too strongly – at least, things that ‘should be’ to be breathy are sometimes estimated with having smaller tilts vis-a-vis PraatSauce
Formula vs. Praat/Snack bandwidth estimation doesn’t seem to have a huge impact on corrections. This is probably because the bandwidth only enters the I&A correction formula in the term \(e^{-\pi B_i/F_s}\), so even changes of an order of magnitude do not radically affect the output
Not only do different spectral measures appear to be better at distinguishing VQ-based contrasts in different languages, but different measures also do better for different vowels/tokens/speakers?
The effects of binning and window size have not been investigated.